Index-Supported Similarity Search Using Multiple Representations

نویسندگان

Johannes Aßfalg

Michael Kats

Hans-Peter Kriegel

Peter Kunath

Alexey Pryakhin

چکیده

Similarity search in complex databases is of utmost interest in a wide range of application domains. Often, complex objects are described by several representations. The combination of these different representations usually contains more information compared to only one representation. In our work, we introduce the use of an index structure in combination with a negotiation-theorybased approach for deriving a suitable subset of representations for a given query object. This most promising subset of representations is determined in an unsupervised way at query time. We experimentally show how this approach significantly increases the efficiency of the query processing step. At the same time the effectiveness, i.e. the quality of the search results, is equal or even higher compared to standard combination methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Flexible Similarity Search of Semantic Vectors Using Fulltext Search Engines

Vector representations and vector space modeling (VSM) play a central role in modern machine learning. In our recent research we proposed a novel approach to ‘vector similarity searching’ over dense semantic vector representations. This approach can be deployed on top of traditional inverted-index-based fulltext engines, taking advantage of their robustness, stability, scalability and ubiquity....

متن کامل

PP-Index: Using Permutation Prefixes for Efficient and Scalable Approximate Similarity Search

We present the Permutation Prefix Index (PP-Index), an index data structure that allows to perform efficient approximate similarity search. The PP-Index belongs to the family of the permutationbased indexes, which are based on representing any indexed object with “its view of the surrounding world”, i.e., a list of the elements of a set of reference objects sorted by their distance order with r...

متن کامل

Learning Binary Codes For Efficient Large-Scale Music Similarity Search

Content-based music similarity estimation provides a way to find songs in the unpopular “long tail” of commercial catalogs. However, state-of-the-art music similarity measures are too slow to apply to large databases, as they are based on finding nearest neighbors among very high-dimensional or non-vector song representations that are difficult to index. In this work, we adopt recent machine le...

متن کامل

Efficient Similarity Search on Vector Sets

Similarity search in database systems is becoming an increasingly important task in modern application domains such as multimedia, molecular biology, medical imaging, computer aided design and many others. Whereas most of the existing similarity models are based on feature vectors, there exist some models which use very complex object representations such as trees and graphs. A promising way be...

متن کامل

Incremental All Pairs Similarity Search for Varying Similarity Thresholds with Reduced I/O Overhead

All Pairs Similarity Search (APSS) is the problem of finding all pairs of records with similarity scores above a specified threshold. Incremental All Pairs Similarity Search (IAPSS) is the problem of performing APSS multiple times over the same dataset by varying the similarity threshold. This problem is ubiquitous in many real-world systems like search engines, online social networks, and digi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Index-Supported Similarity Search Using Multiple Representations

نویسندگان

چکیده

منابع مشابه

Flexible Similarity Search of Semantic Vectors Using Fulltext Search Engines

PP-Index: Using Permutation Prefixes for Efficient and Scalable Approximate Similarity Search

Learning Binary Codes For Efficient Large-Scale Music Similarity Search

Efficient Similarity Search on Vector Sets

Incremental All Pairs Similarity Search for Varying Similarity Thresholds with Reduced I/O Overhead

عنوان ژورنال:

اشتراک گذاری